๐Ÿ“ˆ RAG ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ: ์ˆœ์œ„ ๊ณ ๋ ค ์ง€ํ‘œ(Rank-Aware) - mAP, mRR, nDCG

RAG์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์‹œ์Šคํ…œ์˜ ํšจ์œจ์„ฑ๊ณผ ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ์–ผ๋งˆ๋‚˜ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์•„์ฃผ๋Š”์ง€๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ์—ฌ๋Ÿฌ ์ง€ํ‘œ๋“ค์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ๊ทธ์ค‘ ๋Œ€ํ‘œ์ ์ธ ๊ฒƒ์ด mAP (Mean Average Precision), mRR (Mean Reciprocal Rank), nDCG (Normalized Discounted Cumulative Gain) ์ž…๋‹ˆ๋‹ค. ๊ฐ ์ง€ํ‘œ๋Š” ํ‰๊ฐ€ํ•˜๋Š” ๊ด€์ ๊ณผ ๋ชฉ์ ์ด ๋‹ค๋ฅด๋ฏ€๋กœ, ์‹œ์Šคํ…œ์˜ ํŠน์„ฑ๊ณผ ํ‰๊ฐ€ ๋ชฉํ‘œ์— ๋งž์ถฐ ์ ์ ˆํ•œ ์ง€ํ‘œ๋ฅผ ์„ ํƒํ•˜๊ณ  ํ•ด์„ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ˜Š

1. mRR (Mean Reciprocal Rank)

์ •์˜ (Definition)

mRR (Mean Reciprocal Rank) ์€ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ(Query)์— ๋Œ€ํ•ด ๊ฒ€์ƒ‰๋œ ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ์—์„œ ์ฒซ ๋ฒˆ์งธ ์ •๋‹ต(๊ด€๋ จ์„ฑ ๋†’์€ ํ•ญ๋ชฉ)์ด ๋‚˜ํƒ€๋‚œ ์ˆœ์œ„์˜ ์—ญ์ˆ˜(Reciprocal Rank, RR) ๋ฅผ ๊ตฌํ•˜๊ณ , ์ด ๊ฐ’๋“ค์˜ ํ‰๊ท ์„ ๋‚ธ ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์‹œ์Šคํ…œ์ด ์‚ฌ์šฉ์ž๊ฐ€ ์ฐพ๋Š” '๋‹จ ํ•˜๋‚˜์˜ ์ •๋‹ต'์„ ์–ผ๋งˆ๋‚˜ ๋นจ๋ฆฌ ์ฐพ์•„์ฃผ๋Š”์ง€์— ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค.

์„ค๋ช… (Explanation)

์˜ˆ์‹œ (Example)

3๊ฐœ์˜ ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์™€ ์ฒซ ๋ฒˆ์งธ ์ •๋‹ต์˜ ์ˆœ์œ„๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค.

์ด ๊ฒฝ์šฐ, mRR์€ (1 + 1/3 + 1/2) / 3 = (6/6 + 2/6 + 3/6) / 3 = (11/6) / 3 = 11/18 โ‰ˆ 0.611 ์ž…๋‹ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ํ‰๊ท ์ ์œผ๋กœ ์ฒซ ๋ฒˆ์งธ ์ •๋‹ต์„ ์•ฝ 1~2์œ„ ์‚ฌ์ด์—์„œ ์ฐพ์•„์ค€๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. mAP (Mean Average Precision)

์ •์˜ (Definition)

mAP (Mean Average Precision) ๋Š” ๊ฐ ์ฟผ๋ฆฌ๋ณ„ Average Precision (AP) ๊ฐ’์˜ ํ‰๊ท ์ž…๋‹ˆ๋‹ค. AP๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ์—์„œ ๊ด€๋ จ์„ฑ ๋†’์€ ํ•ญ๋ชฉ๋“ค์ด ์ƒ์œ„์— ์–ผ๋งˆ๋‚˜ ์ž˜ ๋žญํฌ๋˜์—ˆ๋Š”์ง€๋ฅผ ์ •๋ฐ€๋„(Precision) ์™€ ์žฌํ˜„์œจ(Recall) ์„ ๋ชจ๋‘ ๊ณ ๋ คํ•˜์—ฌ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ด€๋ จ์„ฑ ๋†’์€ ํ•ญ๋ชฉ์„ ๋ชจ๋‘ ์ž˜ ์ฐพ์•„๋‚ด๋Š” ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

์„ค๋ช… (Explanation)

์˜ˆ์‹œ (Example)

์–ด๋–ค ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ์ด 5๊ฐœ์˜ ๊ด€๋ จ์„ฑ ๋†’์€ ๋ฌธ์„œ๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์‹œ์Šคํ…œ์ด 10๊ฐœ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ–ˆ์œผ๋ฉฐ, ๊ทธ ์ˆœ์„œ์™€ ๊ด€๋ จ์„ฑ ์—ฌ๋ถ€๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค๊ณ  ํ•ฉ์‹œ๋‹ค:
[R, N, R, R, N, N, R, N, R, N] (R: Relevant, N: Not Relevant)

  1. 1๋ฒˆ์งธ ๊ฒฐ๊ณผ(R): Precision = 1/1 = 1.0 (๊ด€๋ จ ๋ฌธ์„œ 1/5 ์ฐพ์Œ)
  2. 3๋ฒˆ์งธ ๊ฒฐ๊ณผ(R): Precision = 2/3 โ‰ˆ 0.67 (๊ด€๋ จ ๋ฌธ์„œ 2/5 ์ฐพ์Œ)
  3. 4๋ฒˆ์งธ ๊ฒฐ๊ณผ(R): Precision = 3/4 = 0.75 (๊ด€๋ จ ๋ฌธ์„œ 3/5 ์ฐพ์Œ)
  4. 7๋ฒˆ์งธ ๊ฒฐ๊ณผ(R): Precision = 4/7 โ‰ˆ 0.57 (๊ด€๋ จ ๋ฌธ์„œ 4/5 ์ฐพ์Œ)
  5. 9๋ฒˆ์งธ ๊ฒฐ๊ณผ(R): Precision = 5/9 โ‰ˆ 0.56 (๊ด€๋ จ ๋ฌธ์„œ 5/5 ์ฐพ์Œ)

์ด ์ฟผ๋ฆฌ์˜ AP = (1.0 + 0.67 + 0.75 + 0.57 + 0.56) / 5 โ‰ˆ 3.55 / 5 = 0.71
๋งŒ์•ฝ ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด AP ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ํ‰๊ท ์„ ๋‚ด๋ฉด ๊ทธ๊ฒƒ์ด mAP๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 3๊ฐœ์˜ ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ AP ๊ฐ’์ด ๊ฐ๊ฐ 0.71, 0.5, 0.8 ์ด๋ผ๋ฉด, mAP = (0.71 + 0.5 + 0.8) / 3 โ‰ˆ 0.67 ์ž…๋‹ˆ๋‹ค.

3. nDCG (Normalized Discounted Cumulative Gain)

์ •์˜ (Definition)

**nDCG (Normalized Discounted Cumulative Gain)**๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์ˆœ์œ„ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ๋กœ, ํŠนํžˆ ๊ด€๋ จ์„ฑ์˜ ์ •๋„๊ฐ€ ๋‹ค์–‘ํ•œ(graded relevance) ๊ฒฝ์šฐ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ƒ์œ„ ์ˆœ์œ„์— ์žˆ๋Š” ๊ฒฐ๊ณผ์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๊ณ (Discounted), ์ด์ƒ์ ์ธ(๊ฐ€์žฅ ์ข‹์€) ์ˆœ์„œ๋กœ ์ •๋ ฌ๋˜์—ˆ์„ ๋•Œ์˜ ์ ์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด(Normalized) 0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค.

์„ค๋ช… (Explanation)

์˜ˆ์‹œ (Example)

์–ด๋–ค ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ์ƒ์œ„ 5๊ฐœ ๊ฒฐ๊ณผ์˜ ๊ด€๋ จ์„ฑ ์ ์ˆ˜(0~3์ )๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค:
[3, 2, 3, 0, 1]

์ด ์‹œ์Šคํ…œ์€ ์ƒ์œ„ 5๊ฐœ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด ๊ฑฐ์˜ ์ด์ƒ์ ์ธ ์ˆœ์„œ์— ๊ฐ€๊น๊ฒŒ ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ–ˆ๋‹ค๊ณ  ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (nDCG@5 โ‰ˆ 0.973).

4. ์–ด๋–ค ์ง€ํ‘œ๋ฅผ ์–ธ์ œ ์‚ฌ์šฉํ•ด์•ผ ํ• ๊นŒ? ๐Ÿค”

๊ฐ ์ง€ํ‘œ๋Š” ์ •๋ณด ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ํŠน์ • ์ธก๋ฉด์„ ๊ฐ•์กฐํ•˜๋ฏ€๋กœ, ์‹œ์Šคํ…œ์˜ ๋ชฉํ‘œ์™€ ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์ง€ํ‘œ๋ฅผ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ์—ฌ๋Ÿฌ ์ง€ํ‘œ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค๊ฐ์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋žŒ์งํ•ฉ๋‹ˆ๋‹ค. ๐Ÿ‘

์ฐธ๊ณ :

๊ด€๋ จ ๋…ธํŠธ:
์ •๋ณด ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ๊ธฐ๋ณธ ์›๋ฆฌ
์ •๋ฐ€๋„(Precision)์™€ ์žฌํ˜„์œจ(Recall)์˜ ์ดํ•ด
ํ‰๊ฐ€ ์ง€ํ‘œ ์„ ํƒ ๊ฐ€์ด๋“œ: ๋ถ„๋ฅ˜ ๋ฌธ์ œ vs. ๊ฒ€์ƒ‰ ๋ฌธ์ œ
์ถ”์ฒœ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ: RMSE, MAE, Precision@k, Recall@k
์งˆ์˜ ์‘๋‹ต(Question Answering) ์‹œ์Šคํ…œ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•
๋žญํ‚น ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ์š”: PageRank๋ถ€ํ„ฐ Learning to Rank๊นŒ์ง€
์ด์ง„ ๊ด€๋ จ์„ฑ(Binary Relevance) vs ๋“ฑ๊ธ‰๋ณ„ ๊ด€๋ จ์„ฑ(Graded Relevance)
A/B ํ…Œ์ŠคํŠธ๋ฅผ ํ†ตํ•œ ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ ๋น„๊ต
์˜คํ”„๋ผ์ธ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ ์˜จ๋ผ์ธ ์‚ฌ์šฉ์ž ๋งŒ์กฑ๋„์˜ ๊ด€๊ณ„
์‚ฌ์šฉ์ž ๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ๊ฐœ์„ 

๐Ÿท๏ธ: ์ •๋ณด ๊ฒ€์ƒ‰ ํ‰๊ฐ€ ์ง€ํ‘œ mAP mRR nDCG ์„ฑ๋Šฅ ์ธก์ • ๊ธฐ๊ณ„ ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ณผํ•™ ๊ฒ€์ƒ‰ ์—”์ง„ ์ถ”์ฒœ ์‹œ์Šคํ…œ ๋žญํ‚น